Multi‐scale attention encoder for street‐to‐aerial image geo‐localization
نویسندگان
چکیده
Abstract The goal of street‐to‐aerial cross‐view image geo‐localization is to determine the location query street‐view by retrieving aerial‐view from same place. drastic viewpoint and appearance gap between images brings a huge challenge against this task. In paper, we propose novel multiscale attention encoder capture contextual information aerial/street‐view images. To bridge domain these two view images, first use an inverse polar transform make approximately aligned with Then, explored applied convert into feature representation guidance learnt information. Finally, global mining strategy enable network pay more hard negative exemplars. Experiments on standard benchmark datasets show that our approach obtains 81.39% top‐1 recall rate CVUSA dataset 71.52% CVACT dataset, achieving state‐of‐the‐art performance outperforming most existing methods significantly.
منابع مشابه
Large-Scale Image Geolocalization
In this chapter, we explore the task of global image geolocalization— estimating where on the Earth a photograph was captured. We examine variants of the “im2gps” algorithm using millions of “geotagged” Internet photographs as training data. We first discuss a simple to understand nearest-neighbor baseline. Next, we introduce a lazy-learning approach with more sophisticated features that double...
متن کاملMultiscale Discriminant Saliency for Visual Attention
The bottom-up saliency, an early stage of humans’ visual attention, can be considered as a binary classification problem between center and surround classes. Discriminant power of features for the classification is measured as mutual information between features and two classes distribution. The estimated discrepancy of two feature classes very much depends on considered scale levels; then, mul...
متن کاملImage denoising and restoration with CNN-LSTM Encoder Decoder with Direct Attention
Image denoising is always a challenging task in the field of computer vision and image processing. In this paper we have proposed an encoder-decoder model with direct attention, which is capable of denoising and reconstruct highly corrupted images. Our model is consisted of an encoder and a decoder, where encoder is a convolutional neural network and decoder is a multilayer Long Short-Term memo...
متن کاملRecurrent Neural Network Encoder with Attention for Community Question Answering
We apply a general recurrent neural network (RNN) encoder framework to community question answering (cQA) tasks. Our approach does not rely on any linguistic processing, and can be applied to different languages or domains. Further improvements are observed when we extend the RNN encoders with a neural attention mechanism that encourages reasoning over entire sequences. To deal with practical i...
متن کاملUnsupervised Multiscale Image Segmentation
We propose a general unsupervised multiscale featurebased approach towards image segmentation. Clusters in the feature space are assumed to be properties of underlying classes, the recovery of which is achieved by the use of the mean shift procedure, a robust non-parametric decomposition method. The subsequent classification procedure consists of Bayesian multiscale processing which models the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: CAAI Transactions on Intelligence Technology
سال: 2022
ISSN: ['2468-2322', '2468-6557']
DOI: https://doi.org/10.1049/cit2.12077